Near-Duplicate News Detection Using Named Entities Presenter
Erkan Uyar
MSc.Student
Computer Engineering Department
Bilkent University
Online documents are increasing on the Web and also digital libraries continue to expand. Because of this increase the need for duplicate document detection becomes more critical. Advances in the internet technology also increases the number of news agencies and people tend to read news on internet rather than buying newspapers. Since people follow different news sources from one news portal and these news documents are delivered from the same news agency, duplicate news may occur most of the time. Duplicate news documents create redundancy and few users want to retrieve news containing identical or closely related information. Although there are some efforts for near-duplicate detection, we developed a new near-duplicate detection approach based on the use of named entities. In this approach we used named entities as an identity to analyze the characteristic of the news.
DATE:
15 December, 2008, Monday@ 17:00
PLACE:
EA 409